[WebAssembly] Add extra pattern for dot #151775

badumbatish · 2025-08-01T21:45:50Z

Fixes #50154

llvmbot · 2025-08-01T21:46:27Z

@llvm/pr-subscribers-backend-webassembly

Author: Jasmine Tang (badumbatish)

Changes

Fixes #50154

Full diff: https://github.com/llvm/llvm-project/pull/151775.diff

2 Files Affected:

(modified) llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp (+51)
(added) llvm/test/CodeGen/WebAssembly/simd-dot-reductions.ll (+21)

diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index cd434f7a331e4..648e3b6b2b440 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -192,6 +192,9 @@ WebAssemblyTargetLowering::WebAssemblyTargetLowering(
     // Combine wide-vector muls, with extend inputs, to extmul_half.
     setTargetDAGCombine(ISD::MUL);
 
+    // Combine add with vector shuffle of muls to dots
+    setTargetDAGCombine(ISD::ADD);
+
     // Combine vector mask reductions into alltrue/anytrue
     setTargetDAGCombine(ISD::SETCC);
 
@@ -3436,6 +3439,52 @@ static SDValue performSETCCCombine(SDNode *N,
   return SDValue();
 }
 
+static SDValue performAddCombine(SDNode *N, SelectionDAG &DAG) {
+  assert(N->getOpcode() == ISD::ADD);
+  EVT VT = N->getValueType(0);
+  SDValue N0 = N->getOperand(0), N1 = N->getOperand(1);
+
+  if (VT != MVT::v4i32)
+    return SDValue();
+
+  auto IsShuffleWithMask = [](SDValue V, ArrayRef<int> ShuffleValue) {
+    if (V.getOpcode() != ISD::VECTOR_SHUFFLE)
+      return SDValue();
+    if (cast<ShuffleVectorSDNode>(V)->getMask() != ShuffleValue)
+      return SDValue();
+    return V;
+  };
+  auto ShuffleA = IsShuffleWithMask(N0, {0, 2, 4, 6});
+  auto ShuffleB = IsShuffleWithMask(N1, {1, 3, 5, 7});
+  // two SDValues must be muls
+  if (!ShuffleA || !ShuffleB)
+    return SDValue();
+
+  if (ShuffleA.getOperand(0) != ShuffleB.getOperand(0) ||
+      ShuffleA.getOperand(1) != ShuffleB.getOperand(1))
+    return SDValue();
+
+  auto IsMulExtend =
+      [](SDValue V, WebAssemblyISD::NodeType I) -> std::pair<SDValue, SDValue> {
+    if (V.getOpcode() != ISD::MUL)
+      return {};
+
+    auto V0 = V.getOperand(0), V1 = V.getOperand(1);
+    if (V0.getOpcode() != I || V1.getOpcode() != I)
+      return {};
+    return {V0.getOperand(0), V1.getOperand(0)};
+  };
+
+  auto [LowA, LowB] =
+      IsMulExtend(ShuffleA.getOperand(0), WebAssemblyISD::EXTEND_LOW_S);
+  auto [HighA, HighB] =
+      IsMulExtend(ShuffleA.getOperand(1), WebAssemblyISD::EXTEND_HIGH_S);
+
+  if (!LowA || !LowB || !HighA || !HighB || LowA != HighA || LowB != HighB)
+    return SDValue();
+
+  return DAG.getNode(WebAssemblyISD::DOT, SDLoc(N), MVT::v4i32, LowA, LowB);
+}
 static SDValue performMulCombine(SDNode *N, SelectionDAG &DAG) {
   assert(N->getOpcode() == ISD::MUL);
   EVT VT = N->getValueType(0);
@@ -3558,5 +3607,7 @@ WebAssemblyTargetLowering::PerformDAGCombine(SDNode *N,
   }
   case ISD::MUL:
     return performMulCombine(N, DCI.DAG);
+  case ISD::ADD:
+    return performAddCombine(N, DCI.DAG);
   }
 }
diff --git a/llvm/test/CodeGen/WebAssembly/simd-dot-reductions.ll b/llvm/test/CodeGen/WebAssembly/simd-dot-reductions.ll
new file mode 100644
index 0000000000000..7ac49794491a1
--- /dev/null
+++ b/llvm/test/CodeGen/WebAssembly/simd-dot-reductions.ll
@@ -0,0 +1,21 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mattr=+simd128 | FileCheck %s
+
+target triple = "wasm32-unknown-unknown"
+define <4 x i32> @dot(<8 x i16> %a, <8 x i16> %b) {
+; CHECK-LABEL: dot:
+; CHECK:         .functype dot (v128, v128) -> (v128)
+; CHECK-NEXT:  # %bb.0:
+; CHECK-NEXT:    local.get 0
+; CHECK-NEXT:    local.get 1
+; CHECK-NEXT:    i32x4.dot_i16x8_s
+; CHECK-NEXT:    # fallthrough-return
+  %sext1 = sext <8 x i16> %a to <8 x i32>
+  %sext2 = sext <8 x i16> %b to <8 x i32>
+  %mul = mul nsw <8 x i32> %sext1, %sext2
+  %shuffle1 = shufflevector <8 x i32> %mul, <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
+  %shuffle2 = shufflevector <8 x i32> %mul, <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
+  %res = add <4 x i32> %shuffle1, %shuffle2
+  ret <4 x i32> %res
+}
+

lukel97

Out of curiosity, is it possible to do this as a tablegen pattern? Not that it's necessarily the right thing to do, just wondering if it's easy to do or not!

llvm/test/CodeGen/WebAssembly/simd-dot-reductions.ll

badumbatish · 2025-08-05T02:01:14Z

Out of curiosity, is it possible to do this as a tablegen pattern? Not that it's necessarily the right thing to do, just wondering if it's easy to do or not!

I actually didn't think that tablegen can handle identical arguments in a pattern, i just tried it out just now and i think i might be able to make it work

sparker-arm · 2025-08-05T07:20:28Z

i think i might be able to make it work

I'm assuming the 'illegal' types will make this more difficult in tablegen? This approach looks good to me.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

llvm/test/CodeGen/WebAssembly/simd-dot-reductions.ll

lukel97

PR title needs updated to reflect that this isn't a combine but adds a ~~fold~~pattern

EDIT: Typo, sorry!

llvm/test/CodeGen/WebAssembly/simd-dot-reductions.ll

llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td

lukel97 · 2025-08-06T04:40:03Z

i think i might be able to make it work

I'm assuming the 'illegal' types will make this more difficult in tablegen? This approach looks good to me.

Good point. It looks like the old combine only operated on MVT::v8i16s feeding into MVT::v4i32 adds, but I presume we also want to handle i8s that are sext'd to i32, e.g:

define <4 x i32> @f(<8 x i8> %a, <8 x i8> %b) {
  %sext1 = sext <8 x i8> %a to <8 x i32>
   %sext2 = sext <8 x i8> %b to <8 x i32>
  %mul = mul nsw <8 x i32> %sext1, %sext2
  %shuffle1 = shufflevector <8 x i32> %mul, <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
  %shuffle2 = shufflevector <8 x i32> %mul, <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
  %res = add <4 x i32> %shuffle1, %shuffle2
  ret <4 x i32> %res
}

And from a quick check it looks the multiply is hoisted into the narrower type:

Optimized legalized selection DAG: %bb.0 'f:'
SelectionDAG has 46 nodes:
      t2: v16i8 = WebAssemblyISD::ARGUMENT TargetConstant:i32<0>
    t40: v8i16 = WebAssemblyISD::EXTEND_LOW_S t2
      t4: v16i8 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
    t39: v8i16 = WebAssemblyISD::EXTEND_LOW_S t4
  t35: v8i16 = mul t40, t39
  t37: v4i32 = WebAssemblyISD::EXTEND_HIGH_S t35
  t36: v4i32 = WebAssemblyISD::EXTEND_LOW_S t35
    t0: ch,glue = EntryToken
      t77: v4i32 = WebAssemblyISD::SHUFFLE t36, t37, Constant:i32<0>, Constant:i32<1>, Constant:i32<2>, Constant:i32<3>, Constant:i32<8>, Constant:i32<9>, Constant:i32<10>, Constant:i32<11>, Constant:i32<16>, Constant:i32<17>, Constant:i32<18>, Constant:i32<19>, Constant:i32<24>, Constant:i32<25>, Constant:i32<26>, Constant:i32<27>
      t60: v4i32 = WebAssemblyISD::SHUFFLE t36, t37, Constant:i32<4>, Constant:i32<5>, Constant:i32<6>, Constant:i32<7>, Constant:i32<12>, Constant:i32<13>, Constant:i32<14>, Constant:i32<15>, Constant:i32<20>, Constant:i32<21>, Constant:i32<22>, Constant:i32<23>, Constant:i32<28>, Constant:i32<29>, Constant:i32<30>, Constant:i32<31>
    t30: v4i32 = add t77, t60
  t31: ch = WebAssemblyISD::RETURN t0, t30

We could add a second tablegen pattern for (add (shuffle (sext mul)) (shuffle (sext mul))).

Or we could try and do this again as a dagcombine, but it looks like the multiply is already hoisted by the time the types are legalized so we would have to handle the two different patterns anyway:

Type-legalized selection DAG: %bb.0 'f:'
SelectionDAG has 26 nodes:
      t2: v16i8 = WebAssemblyISD::ARGUMENT TargetConstant:i32<0>
    t40: v8i16 = WebAssemblyISD::EXTEND_LOW_S t2
      t4: v16i8 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
    t39: v8i16 = WebAssemblyISD::EXTEND_LOW_S t4
  t35: v8i16 = mul t40, t39
  t36: v4i32 = WebAssemblyISD::EXTEND_LOW_S t35
  t37: v4i32 = WebAssemblyISD::EXTEND_HIGH_S t35
    t0: ch,glue = EntryToken
        t12: i32 = extract_vector_elt t36, Constant:i32<0>
        t14: i32 = extract_vector_elt t36, Constant:i32<2>
        t16: i32 = extract_vector_elt t37, Constant:i32<0>
        t18: i32 = extract_vector_elt t37, Constant:i32<2>
      t19: v4i32 = BUILD_VECTOR t12, t14, t16, t18
        t21: i32 = extract_vector_elt t36, Constant:i32<1>
        t23: i32 = extract_vector_elt t36, Constant:i32<3>
        t25: i32 = extract_vector_elt t37, Constant:i32<1>
        t27: i32 = extract_vector_elt t37, Constant:i32<3>
      t28: v4i32 = BUILD_VECTOR t21, t23, t25, t27
    t30: v4i32 = add t19, t28
  t31: ch = WebAssemblyISD::RETURN t0, t30

badumbatish · 2025-08-07T19:10:27Z

godbolt for this https://godbolt.org/z/Y1rrnW5h3. The IR at the isel phase looks a bit different. Would you want me to try that in this PR as well?

lukel97

LGTM with the PR title updated

I think we can handle the zext from i8 case in a separate PR if the pattern ends up being somewhat different anyway, but I'll defer to @sparker-arm on this! (I'm not strongly opinionated on this as to if we want to do this as a combine or tablegen pattern)

llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td

badumbatish · 2025-10-13T15:55:04Z

hi @sparker-arm, i'm looking to add the somewhat same pattern to relaxed simd dot, can you confirm if this approach is good for merge and for the subsequent pr?

sparker-arm

Sorry, I didn't know you were waiting on me, please feel free to shout sooner next time!

badumbatish · 2025-10-13T17:27:04Z

no worries, i'll keep that in mind next time, ty for the reviews!

Fixes llvm#50154

badumbatish added 2 commits August 1, 2025 14:41

Precommit test

4d304c8

Added combine support for dot

cb9aac0

llvmbot added the backend:WebAssembly label Aug 1, 2025

Fix merge conflict from main

6553946

badumbatish requested review from lukel97 and sparker-arm August 4, 2025 16:59

lukel97 reviewed Aug 4, 2025

View reviewed changes

llvm/test/CodeGen/WebAssembly/simd-dot-reductions.ll Outdated Show resolved Hide resolved

sparker-arm reviewed Aug 5, 2025

View reviewed changes

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp Outdated Show resolved Hide resolved

llvm/test/CodeGen/WebAssembly/simd-dot-reductions.ll Outdated Show resolved Hide resolved

badumbatish added 2 commits August 5, 2025 10:38

Transition to tablegen for pattern

86fe99b

Addresses PR reviews

34f58f1

lukel97 reviewed Aug 6, 2025

View reviewed changes

Fix PR reviews

78965f0

lukel97 approved these changes Aug 8, 2025

View reviewed changes

llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td Outdated Show resolved Hide resolved

badumbatish changed the title ~~[WebAssembly] Add fold support for dot~~ [WebAssembly] Add extra pattern for dot Aug 11, 2025

Clarify dot pattern comment

75e0c38

sparker-arm approved these changes Oct 13, 2025

View reviewed changes

badumbatish merged commit 55d4e92 into llvm:main Oct 13, 2025
9 checks passed

badumbatish mentioned this pull request Oct 13, 2025

[WebAssembly] [Codegen] Add patterns for relaxed dot #163266

Open

akadutta pushed a commit to akadutta/llvm-project that referenced this pull request Oct 14, 2025

[WebAssembly] Add extra pattern for dot (llvm#151775)

77a3db1

Fixes llvm#50154

[WebAssembly] Add extra pattern for dot #151775

[WebAssembly] Add extra pattern for dot #151775

Uh oh!

Conversation

badumbatish commented Aug 1, 2025

Uh oh!

llvmbot commented Aug 1, 2025

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

badumbatish commented Aug 5, 2025

Uh oh!

sparker-arm commented Aug 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukel97 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukel97 commented Aug 6, 2025

Uh oh!

badumbatish commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

badumbatish commented Oct 13, 2025

Uh oh!

sparker-arm left a comment

Choose a reason for hiding this comment

Uh oh!

badumbatish commented Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lukel97 left a comment •

edited

Loading

badumbatish commented Aug 7, 2025 •

edited

Loading